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A PROBABILISTIC ANALYSIS OF SOME TREE ALGORITHMS 

By Hanene Mohamed and Philippe Robert 
INRIA 

In this paper a general class of tree algorithms is analyzed. It is 
shown that, by using an appropriate probabilistic representation of 
the quantities of interest, the asymptotic behavior of these algorithms 
can be obtained quite easily without resorting to the usual complex 
analysis techniques. This approach gives a unified probabilistic treat- 
ment of these questions. It simplifies and extends some of the results 
known in this domain. 

1. Introduction. A splitting algorithm is a procedure that divides recur- 
sively into subsets an initial set of n items until each of the subsets obtained 
has a cardinality strictly less than some fixed number D. These algorithms 
have a wide range of applications: 

(a) Data structures. These are algorithms on data structures used to sort 
and search. They are sometimes referred to as divide and conquer algorithms. 
See [6] and [23] for a general presentation and [28, 39, 40] for their analysis 
with analytical methods. 

(b) Communication networks. These algorithms are used to give a dis- 
tributed access to a common communication channel that can transmit only 
one message per time unit. See [3, 10, 41]. 

(c) Distributed systems. Some algorithms use a splitting technique to 
select a subset of a set of identical communicating components. See [18, 36]. 

(d) Statistical tests. A test, performed on a set of individuals, indicates 
if at least one of these individuals has some characteristics (like a disease 
if this is blood testing). The purpose is to minimize the number of tests to 
identify individuals with the specified characteristic as quickly as possible. 
See [43]. 
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Formally, a splitting algorithm can be described as follows: 

Splitting Algorithm S{n) 

— Termination Condition. 
If n< D — > Stop. 

— Tree Structure. 

If n > D, randomly divide n into ni, . . . , uq, with ni + • — h uq = n 
where G is a random variable with some fixed distribution. 

— > Apply 5(ni), ^(na), . . . , Sine). 

1.1. Description. The algorithm starts with a set of n items. This set is 
randomly split into G subsets, the distribution of G is given by ¥{G = i) =Pi, 
where {pe) is a probability distribution on {2, 3, . . . }. Now, conditionally on 
the event {G = £}, for 1 < i < £, an item is sent into the ith. subset with 
probability V^.^, where Vi = (Fj/; 1 < i < i) is a random probability vector 
on {!,...,£}. It can also be seen as a vector of random weights on the i arcs 
of the branching procedure on which each of the n items perform a random 
walk. 

If Ni is the cardinality of the ith subset, then, conditionally on the 
event {G = i} and on the random variables Vi^^, . . . , Vg/, the distri- 
bution of the vector (A^i, . . . , A'^) is multinomial with parameter n and 

P((iVi , . . . , AT,) = (mi , . . . , m,)) = —-^ n (Vk^.r" , 

for {rrii) £ N" such that mi + • • ■+m£ = n. If the ith. subset, 1 <i <n, is such 
that Ni < D, the algorithm stops for this subset. Otherwise, it is applied to 
the ith subset: a variable Gi, with the same distribution as G, is drawn and 
this ith subset is split into Gi subsets, and so on. 

Such a random splitting has been introduced by Devroye [7] where the 
asymptotic expansion of the depth of the associated tree is investigated. 

Examples, (i) Knuth's algorithm. When F{G = 2) = 1, D = 2 and 
2 = ^2,2 = 1/2, this is one of the oldest algorithms of this kind. It was 
analyzed by Knuth in 1973. 

(ii) Symmetrical splitting algorithm. This is the case where Vi^n = 1/ra 
for any n>2 and 1 < i < n. 

(iii) Q-ary algorithm. If P(G = Q) = 1 and D = 2, this is the Q-ary reso- 
lution algorithm with blocked arrivals analyzed by Mathys and Flajolet [30]. 

See also [7] for other examples. Quite naturally, such an algorithm can be 
graphically represented with a tree as shown by Figure 1. 
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Fig. 1. Splitting algorithm with D — 2, two sets o/ random weights (Vi,2,V2,2) and 
(Vi,3, V2,3, Vi,3), G a random variable with values in {2,3} and the initial items A, B, 
C, D, E and F. 



Splitting measure. As it will be seen in the following, the key characteris- 
tic of this splitting algorithm is a probability distribution W on [0, 1] defined 
with the branching distribution (the variable G) and the weights on each 
arc [the vector (Vi^g, . . . , Vg.g)]- The asymptotic behavior of the algorithm 
is expressed naturally in terms of the distribution W. 

Definition 1. The splitting measure is the probability distribution W 
on [0, 1] defined by, for a nonnegative Borelian function /, 

(1) / f{x)W{dx) = E V,,Gf{V^.G) = E E = m{y^,d{V^,^)■ 

•' \i=l I e=2i=l 

Assumption (A). Throughout the paper, it is assumed that, almost 
surely G >2, and that there exists some 6 > such that the relation 

(A) sup sup Vi/ <5<1 

e>2 i<i<e 

holds almost surely, in particular W([0,5]) = 1. These conditions imply in 
particular the nondegeneracy of the splitting mechanism. 

Definition 2. A splitting measure W is exponentially arithmetic if 
there exists some A > such that 

>V({e-"^:n>l}) = l, 

and the largest A satisfying this relation is defined as the exponential span 
of W. 
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If A is some random variable with distribution W, then W is exponentially 
arithmetic with exponential span A if and only if the distribution of — log(^) 
is arithmetic with span A. See [14]. 

Examples, (i) Knuth's algorithm, P(G = 2) = 1, D = 2 and Vi^2 = 
V2,2 = 1/2. 
In this case 

where 5x is the Dirac distribution at x and W is exponentially arithmetic 
with exponential span log 2. 

(ii) Symmetrical splitting algorithm. 

n>2 

the exponential span is log D where D is the largest integer p such that the 
support of the random variable G is contained in pN. 

(iii) Q-ary algorithm, F{G = Q) = 1, D = 2, Vi^q =pi,..., Vq^q = pq: 

yV{dx) =piSp^ +P2<5p2 H ^PQ^PQi 

the distribution W is exponentially arithmetic if and only if all the real 
numbers logpi/ logpj , 1 <i < j <:Q, are rational. 

The cost of a splitting algorithm. For such an algorithm, an important 
quantity is the number of operations required until the algorithm stops, that 
is, when all the subsets have a cardinality less than or equal to D. Denote 
by Rn this quantity when the number of initial items is n; then clearly: 

(a) Rn = 1 when n< D; 

(b) for n>D, 

(2) R^'^=l + Ri,N^^ + --- + RG,Ng, 

where conditionally on the event {G = i} and the random variables Vi^i, V2,ii 

• • • ) V^,£, 

(1) the vector (A^f , . . . ,-/V^) has a multinomial distribution with parameter 
n and (Vi,^,V'2,f, • • • , V^,^); 

(2) for (pi) G N^, the variables • • • , Rg^-p^ are independent; 

(3) for 1 < i < ^, the variable Ri^p^ has the same distribution as . 

The variable Rn is simply the number of nodes of the associated tree; see 
Figure 1. 



PROBABILISTIC ANALYSIS OF TREE ALGORITHMS 



5 



1.2. Unusual laws of large numbers. Note that, since the sphtting proce- 
dure is random, the variable Rn is a random variable. With the language of 
communication networks, this quantity can be thought of as the total time 
to transmit n initial messages. If E,{Rn) is its expected value, E,(Rn)/n is 
the average transmission time of one message among n. From a probabilistic 
point of view, it is natural to expect that the sequence (Rn) satisfies a kind 
of law of large numbers, that is, that (E(i?„)/n) converges to some quantity 
a. The constant a is, in some sense, the asymptotic average transmission 
time of a message. Curiously, this law of large numbers does not always hold. 
In some situations, the sequence (E(i?„,)/n) does not converge at all and, 
moreover, exhibits an oscillating behavior. 

When the splitting degree is constant and equal to 2 and Vi^2 = ^2,2 = 1/2 
(the items are equally divided among the two subsets), these phenomena are 
quite well known. They have been analyzed using complex analysis tech- 
niques, functional transforms (and their associated inversion procedures) by 
Knuth [23], Flajolet, Gourdon and Dumas [13], Louchard and Prodinger [27] 
and many others. See [16, 28, 39] for a comprehensive treatment of this ap- 
proach. See also [8] for a survey of the domain. Robert [38] proposed an 
alternative, elementary method to get the asymptotic behavior of some re- 
lated oscillating sequences without using complex analysis. 

When the splitting degree is constant and equal to Q but the items are 
not equally divided among the subsets, studies are quite rare. Using complex 
analysis techniques, Fayolle, Flajolet and Hofri [12] obtained the asymptotic 
behavior of the associated sequence (E(i?„)). Mathys and Flajolet [30] pre- 
sented a sketch of a generalization of this study when Q is arbitrary. 

Some alternative approaches, (i) Some laws of large numbers have been 
proved by Devroye [8] in a quite general framework for various function- 
al of the associated trees. Talagrand's concentration inequalities are the 
main tools in this study. In our case, it would consist in proving that the 
distribution of the random variable i?n/E(i?„) is sharply (with an exponen- 
tial decay) concentrated around 1. Results on limiting distributions such as 
central limit theorems do not seem to be accessible with this method. 

(ii) Clement, Flajolet and Vallee [5] analyzed related algorithms in the 
more general context of dynamical systems. By using a Hilbertian setting, 
they showed that the first-order behavior of the algorithms is expressed 
in terms of the spectrum of a functional operator, the transfer operator. 
Getting explicit results in this way requires therefore a good knowledge of 
some eigenvalues of the transfer operator. 

A dynamic version of this class of algorithms is investigated in [33] . The 
splitting procedure is the same but, in the language of branching processes, 
an immigration occurs at every leaf of the associated tree, that is, new items 
arrive every time unit. This dynamic feature complicates the problem. In 



6 



H. MOHAMED AND P. ROBERT 



this case, an additional probabilistic tool has to be used: an autoregressive 
process with moving average plays an important role. 

1.3. Related problems. 

Fragmentation processes. A continuous version of a splitting algorithm 
could be defined as follows: an initial mass of size x is randomly split into 

several pieces and, at their turn, each of the pieces is randomly split A 

class of such models has been recently investigated. The fragmentation of 
each mass occurs after some independent exponential time with a parame- 
ter depending, possibly, on its mass. See [2, 32] and references therein. The 
problems considered are somewhat different: regularity properties of associ- 
ated Markov processes, duality, rate of decay of individual masses, loss of 
mass, asymptotic distributions, and so on. A splitting algorithm is just a 
recursive fragmentation of an integer into integer pieces until each of the 
components has a size less than D. In a continuous setting, an analogue of 
the algorithms considered here would consist in stopping the fragmentation 
process of a mass as soon as its value is below some threshold e > 0. 

Random recursive decompositions. As it will be (easily) seen, a splitting 
algorithm can also be described as a random recursive splitting of the inter- 
val [0,1]. For example, in the case of a dyadic splitting, starting from the 
interval [0,1], two subintervals Ji, I2 are created and each of them is split 
at its turn and so on. 

These random recursive decompositions have been considered from the 
point of view of the geometry of the boundary points by various authors, to 
express the Hausdorff dimension of this set of points in particular. Mauldin 
and Williams [31] and Waymire and Williams [42] considered decompositions 
of the interval [0, 1] which are not necessarily conservative, that is, when 
1-^1 1 + |-^2| < 1 holds with positive probability in the dyadic case. 

Hambly and Lapidus [15] and Falconer [11] considered decompositions of 
the interval [0, 1] from the point of view of the lengths of the associated 
subintervals. The interval [0, 1] is represented by a nonincreasing sequence 
{Ln ) whose sum is 1 . For n > 1 , L„ is the length of the nth largest interval of 
the decomposition. This description is similar to the classical representation 
of fragmentation processes. See [34] . 

In this setting, multiplicative cascades and martingales introduced by 
Mandelbrot [29] and Kahane and Peyriere [19] show up quite naturally. They 
have been analyzed quite extensively; see [1, 25] and references therein. 

1.4. An overview. The purpose of this paper is twofold. First, it consid- 
ers splitting algorithms with a random (and possibly unbounded) degree of 
splitting generalizing the previous studies in this domain. Second, and this 
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is in fact the main point of the paper, it proposes a probabihstic approach 
that simphfies greatly the analysis of these algorithms. Moreover, as a by- 
product, a new direct representation of the asymptotic oscillating behavior 
is established. 

The analysis proposed in this paper also starts from (2), but its treatment 
is significantly different from the analytic approach. After some transfor- 
mation, (2) is interpreted as a probabilistic equation which is iterated by 
using appropriate independent random variables. Following the method of 
Robert [38], the next step is to perform a probabilistic de-Poissonnization 
and, by using Fubini's theorem conveniently, to represent the quantity E(i2,„) 
by using a Poisson point process on the real line. The final, crucial step 
which differs from [38] , consists in using the key renewal theorem to get the 
asymptotic behavior of the sequence (E(i?„)). 

The approach is elementary; its main advantage over the analytic treat- 
ment lies certainly in the use of the renewal theorem which gives directly 
the asymptotic behavior. 

Results of the paper. Section 2 gives a useful representation for the aver- 
age cost of the algorithm. The main result of the paper for the asymptotic 
cost is the following theorem in Section 3. This is a summary of Propositions 
9 and 11. 



Theorem 3 (Asymptotics of the average cost). For a splitting algo- 
rithm, under the condition 

(3) n-^WidyX+oo, 

Jo y 

— if the splitting measure W is not exponentially arithmetic, then 

(4) l.m ^ . 

— If the splitting measure W is exponentially arithmetic with exponential 
span A > 0, as n gets large, the equivalence 

(5) m^^^flogn 



n V A 

holds, where F is the periodic function with period 1 defined by, for x > 0, 

E(G) A 



J^\logiy)mdy)l-e' 



Fx) 



A }J{D-l)l 
and {z} = z — [zj is the fractional part of z gM.. 



exp(— A'jx T^IIt" TT^ ^ 
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Condition (3) is not really restrictive since the variable G is bounded 
in practice. This theorem covers and extends some of the results in this 
domain: for Knuth's algorithm [23] and for Q-ary algorithms with blocked 
arrivals [30], see Corollaries 10 and 12. 

Furthermore, when there are asymptotic periodic oscillations, the peri- 
odic function F involved is expressed directly and not in terms of its Fourier 
coefficients as is usually the case. The expression of F generalizes the rep- 
resentation of [38] obtained for Knuth's algorithm. 

The distribution of the sequence (i?n) (and not only its average) is inves- 
tigated in Section 4. For simplicity, only the case where the variable G is 
constant and the variables V.^g are equal to 1/G is considered. The purpose 
of this section is to show that the distribution of the Poisson transform of 
the sequence and, more generally, the distribution of most of the functionals 
of the associated tree, can be expressed quite simply in terms of Poisson 
processes and uniformly distributed random variables. 

Two representations of the distribution of the Poisson transform as a 
functional of Poisson processes are derived. As a consequence, a law of large 
numbers is proved when the number of initial items has a Poisson distribu- 
tion (Poisson transform). Moreover, the asymptotic oscillating behavior of 
the algorithm is proved as a consequence of a standard law of large numbers. 
These unusual laws of large numbers are, in the end, in the realm of classical 
laws of large numbers. 

The central limit theorem is also proved with a similar method in this case. 
This is a classical result (see [28] ) ; it is usually proved with complex analysis 
methods via quite technical estimations. It is proved here as a consequence 
of the standard central limit theorem for independent random variables. At 
the same time, a new representation of the asymptotic variance is obtained. 

2. General properties. Throughout this paper, (A/'([0,x])) denotes a Pois- 
son process with intensity 1; equivalently it can also be described as a non- 
decreasing sequence (i„) such that [tn+i — tn) is a sequence of i.i.d. random 
variables exponentially distributed with parameter 1. For x > 0, the variable 
A/'([0,x]) is simply the number of t^'s in the interval [0,a;]. See [20] for basic 
results on Poisson processes. 

Equation (2) and the boundary conditions for the sequence {Rn) are sum- 
marized in the following relation, for n > 0: 

Rn ' 1 + Rl,N'l H 1- Rg.N^ — G'^{n<D}-> 

therefore, 

G 

(6) Rn - 1 Y.^Ri,N^ - 1) + Gl{n>Dy 

1=1 
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Definition 4. The Poisson transform of a nonnegative sequence (a^) 
is defined as 

(7) ^a„— e-^ = E(aAr([o,x]))- 

n>0 

The following proposition gives useful representations of the Poisson trans- 
form of the sequence of (E (/?„)). 



Proposition 5 [Poisson transform of the sequence {Rn)]- For x > 0, 

/ + 00 ^ \ 

where (Wi) is an i.i.d. sequence of random variables with distribution W. 

Proof. If n is a Poisson random variable with parameter x, the splitting 
property of Poisson variables (see [20], e.g.) shows that, conditionally on the 
event {G = i} and on the variables Vi^£, . . . , Vi^i, the variables A''", 1 <i <i, 
are independent and N-^ has a Poisson distribution with parameter xVi^i. 
Consequently, for x > 0, if 

it is easily checked that K{G)^{x) Ri — Rq = as x \ 0. 
Since {7\A([0,x]) > D} = {to < x}, (6) gives the relation 

(10) $(x) = ^ P(G = £)E Vi^MxVi^e) + -F{tD < x). 

e=2 \i=i / ^ 

Equation (10) can then be rewritten as 

(11) c^(x) = Ei^xWi)) + E (^lo^<x}) • 
The iteration of (11) shows that, for n > 1, 

The assumption on the variable G and the sequence of vectors (Vn) implies 
that, almost surely, the sequence (11^=1 ^k) converges to 0. The function $ 
can thus be represented as 

^{x)=e(y 1.^ ^ ™ ,,,,). 
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The proposition has been proved. □ 

From now on, throughout the paper, (Wi) wih denote an i.i.d. sequence 
of random variables on [0, 1] with distribution W. 

Proposition 6 (Probabilistic de-Poissonnization) . For n> D, then 
(12) E(ii„) = l + E(G)E 



i=0 



where, for < y <1, 

Tiy)=mii^i>l:f[Wk<y^ 

and is the Dth smallest variable ofn independent, uniformly distributed 
random variables on [0,1] independent of{Wi). 

Proof. For x > 0, by decomposing with respect to the number of points 
of the Poisson process (7V(t)) in the interval [0,x], one gets, for < a < 1, 

F{tD <xa)=J2 ^i^D < xa,Af{[0, x]) = n) 

n=D 

+ 0O 

= F{tD <xa\Ar{[0,x]) =n)¥{M{[0,x]) =n). 

n=D 

For n> D, conditionally on the event {7V([0,x]) = n}, the variable t/j has 
the same distribution as the D smallest random variable of n uniformly 
distributed random variables on [0,a;]. When x = 1, denote by J/^^^ a vari- 
able with this conditional distribution. Clearly, by homogeneity, the variable 
(ti:)|AA([0,x]) = n) has the same distribution as xU^y Finally, one gets the 
identity 



P(tD<xa)= ^P(i7g)<a)— e 



n=D 



( 

\ n=D 



By using the independence of the sequence iyVi) and to in (8), the last 
identity gives the relation 



PROBABILISTIC ANALYSIS OF TREE ALGORITHMS 11 

By Fubini's theorem and writing 1 = exp(x) exp(— x), this expression can be 
rewritten as 



D—l n +00 I /+00 1 \ \ 



The identification of (7) of E(i?_^(-[o^^.])) and the last identity gives (12). □ 

Corollary 7 (Symmetrical Q-ary algorithm). When P(G = Q) = \ 
holds and Vi^Q = l/Q, for i = 1, . . . ,Q , then for n>D, 

(13) E(i?„) = 1 + .^—[^{Q^-'-^qK)'\) - 1) 

with, for < X < 1, 

D-l 



fc=o ^ ^ 



From (13), by using the fact that nUl^-j converges in distribution as n 
tends to infinity, it is not difficult to get the asymptotic behavior of E(ii„). 
The general case, (12), is slightly more complicated. One has to study the 
asymptotics of the series inside the expectation. 

2.1. A functional integral equation. If R{x) = E(i?^(]o^2;])) denotes the 
expected value of the Poisson transform of the sequence (Rn), then (6) gives 
the relation 

1=2 \i=l J 



p+oo 11^—1 

h{x) = l-E{G)j^ T^—^du, 



by denoting 

(D-l)! 

it is easy to see that the above identity can be written as the following 
integral equation: 

(14) R[x)= R{xu)^^:^^^ + h{x). 

Jo u 

Recall that W is some probability distribution on the interval [0, 1] . For the 
Q-ary protocol considered by Mathys and Flajolet [30], this equation is 



R[x) = R{xpi) + h{x). 



i=l 



12 



H. MOHAMED AND P. ROBERT 



It is analyzed by considering the Mellin transform R*{s) of R{x) on some 
vertical strip 5 of C, 

p+oo 

R*{s)= / R{u)u'~^du, s€S, 
Jo 

which, in this case, is given by 

The analytical approach consists in analyzing the poles of R*{s) on the 
right-hand side of S, basically the solutions with positive real part of the 
equation 

Then, by inverting the Mellin transform and using complex analysis tech- 
niques, the asymptotic behavior of {R{x)) at infinity is described in terms 
of these poles. The final step, an analytic inversion of the Poisson trans- 
form together with technical estimates, establishes a relation between the 
asymptotic behaviors of the function x — > R{x) and of the sequence [Rn]- 

In the general case considered here, (14) gives the following expression for 
the Mellin transform of {R{x)): 

i?*(.) = r(.)/(l-/^^~^W(dn)). 

An analogue of the analytic approach would start with the study of the roots 
s E C, K(s) > 0, of the equation 

(15) / W{du) = l, 

Jo 

and, if possible, proceed with successive inversions of Mellin transform and 
Poisson transform. 

As it will be seen, our direct approach reduces to the minimum the tech- 
nical apparatus required for such an analysis. The Poisson transform of (i?n) 
is also used in our method, but it is conveniently represented [see (8)] so that 
it can be right away inverted to give an explicit expression (12) for E(i?„) 
which will give directly the asymptotic behavior of the sequence (E(i?„)). 

Interestingly, (L„) denotes the nonincreasing sequence of the lengths of 
the subintervals of [0, 1] associated to the splitting procedure (see Sec- 
tion 1.3). The zeta function of the string (Ln) is defined as the meromorphic 
function 

n>l 
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see [15] and [24]. It is not difficult to see that the relation 

E(C(s)) = u'W{du) / ( 1 - u'W{du)\ 

holds. In particular, the poles of the zeta function of the associated random 
recursive string can be expressed in terms of the solutions of (15). 

3. Analysis of the asymptotic average cost. 

An associated random walk. If (VFj) is an i.i.d. sequence with common 
distribution W defined by (1), the sequence {Bi) = {—\og{Wi)) is an i.i.d. 
sequence of nonnegative random variables. The random walk (5„) is associ- 
ated to {Bi), 

Sn = Bi + B2 + --- + Bn, n>0. 

As it will be seen, the asymptotic behavior of the splitting algorithm depends 
a great deal on the distribution of (Bi). For x > 0, the crossing time i^x of 
level X by (5„) is defined as 

I'x = inf{n : Sn> x}. 

For < y < 1, the variable T{y) of Proposition 6 is simply \og{y) ■ See [9, 14] 
for the main results concerning renewal theory used in the following. 
If ^ is defined as 

then by (12), 

(16) E(i?„) = 1 + E(G)E[^(- log(C/(^)))]. 

It is clear that — log{U^^) converges in distribution to +oo as n goes to infin- 
ity. The asymptotic behavior of ^ at infinity is first analyzed; this function 
can be rewritten as 

^ e^'-^j =E(^£e^''--'-^j. 

3.1. The nonarithmetical case. In this part, it is assumed that the dis- 
tribution of Wi is not exponentially arithmetic. See Definition 2. 



Lemma 8. Under the condition 
Wi 



E(iMM)=/'M^)ln.(<i.)<+oo. 
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the relation 



holds. 



supE(e'^"- ^) < +00 

x>0 



Proof. Lorden's inequalities (see [4, 26]) show that, for any p > 0, 

supE((5., - xf) < M Br'y, 

x>o {p+l)E{Bi) 
thus, one gets the relation 

1 r+00 

supE(e'^''--'^) < — — - / {u + 2)e"P(Si > u) du 
x>o E(i?i) Jo 

= E((5i + l)e^^)-l 
/ -log(T^i) + l \ 

^1 — w, — j-i<+°°- □ 

For i > 1, the renewal theorem shows that, when x goes to infinity, the 
variable Si,^~i — x converges in distribution to — (r* + ri + T2 + • • • + Tj_i), 
where the variables (r^) are i.i.d. distributed as Bi and independent of r* 
whose distribution is given by 

1 /■+00 

lE(/(^*)) = iT717T / f{u)nBi>u)du, 



for any nonnegative Borelian function on M. By Assumption (A), the incre- 
ments of the random walk (Sn) are bounded below by — log((5), therefore 
one gets the relation, for 1 < K < I'x, 

i=K i=K 

From Lemma 8 and (17), one deduces then 

/+00 \ 
lim ^(x)e ^ = E V exp(-r* - Ti - r2 '^i-i) 

(18) _l-E(exp(-ri)) 1 



E(ti) 1 -E(exp(-ri)) 

1 



-E(log(T^i))' 

since the density of r* on M4, is given by 

P(ti >x)/E(ti), x>0. 
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Proposition 9 (Convergence of averages). If the distribution ofWi is 
not exponentially arithmetic and such that 



E 



log(W-i)| 



then the following convergence holds: 

E(i?„) 



lim 



< +00, 



E(G) 



(L>- l)E(-logVFi)' 
Proof. Equation (16) gives that, for n > 1, 

= 1 + E(G)Ef^[-log(C/(^))]exp(log(^g)))-i^). 



nU, 



(n) 



As n goes to infinity the variable nU^^ converges in distribution to a random 
variable t^ which is a sum of D i.i.d. exponential random variables with 
parameter 1; furthermore, 



lim 

n.-»+oo \nU^^ 



E 



1 



D-1 



For e > 0, there exists K such that, for x > K, \^{x) exp(— x) + l/E(logl^i)| < 
£; if C denotes the supremum of x ^ ^(x) exp(— x) on M_|_, then 



E 



^[-log(^g))]exp(log(^g,))-^) - 



(19) 



<eE 



+ c + 



E(- log w^i) r V i^(n)>-p(-^)>;^) 



+ 



E(- log 1^1) 



E 



L>- 1 



For K2>0, 



li^supE^l{^.^>,,p(_^)j-^j < l^-PE(^l^„^.^>^^,,p(_^)j-^ 



1E( l{io>E'2exp(-X)} 



and this term goes to as K2 tends to infinity. One concludes that the right- 
hand side of (19) is arbitrarily small as n goes to infinity. The proposition 
is proved. □ 
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Corollary 10. (1) Q-arj protocol with blocked arrivals. When D = 
2, G = Q and Vi^q = pi for 1 <i <Q, then, if at least one of the real numbers 
log Pi/ log pi, 2 <i <Q, is not rational, the convergence 

^{Rn) _ Q 



lim 



holds. 

(2) Symmetrical case. If G is not a degenerated random variable such 
that E(logG) < +0O and, for £>2 and l<i<i, Vi^i = Xji, then 

n^+oo n (L>- l)E(logG)' 

3.2. The arithmetical case. It is assumed that the distribution of W\ is 
exponentially arithmetic with exponential span A > 0. The law of — log(VFi)/A 
is a probability distribution on N. For i > 1, one defines Cj = -Bj/A = 
— log(VFi)/A. In the arithmetic case, the integer-valued random walk as- 
sociated to (Cj) plays the key role, much in the same way as for in the 
nonarithmetic case. By denoting 



infj/t > l:^Ci >n|, 



(17) can be rewritten as, for x > 0, 

vI,(:^)e-^r-/Al=]E^^ expj^Aj^ -[x/Al jj , 

where \y \ = inf {n G N : n > ?/} for y > 0. By using the discrete renewal theo- 
rem, for i > 1, as n goes to infinity, the variable Ci H h Gr„-i — n converges 

in distribution to —{Cl + C2-\ \-Ci), where is an independent random 

variable whose distribution is given by 

IP(Cr=n) = ^^P(Ci>n), n>l. 

With the same method as in the nonarithmetic case, if the variable | log(Wi)|/W^i 
is integrable, then 

lim vI/(x)e-^r-/Al=-_I^^£:i^. 
^^+oo E(|log(l^i)|) 



Proposition 11 (Asymptotic periodic oscillations). If the distribution 
ofWi is exponentially arithmetic with exponential span A > 0, and such that 

k(M)<.», 
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then, as n gets large, the equivalence 



F 



n 



logn 
A 



holds, where F is the periodic function with period 1 defined by, for x>0, 

E(G) A 



E(|log(m)|)l-e- 



exp ( — A< X 



F{x) 



and {x} = X — [xj . 

Proof. For n > 1, if [x] = [xj + 1, 



logy 



,D~2 



{D-iy. 



e y dy 



1, 



-E 



[M'(-log(t/g)))] 



n 



:E 



log C/(^))e-"^"^°^(^('V^)l exp ( -A 



log(C/,^ 



{n)l 



A 



nUP. 

(n) 



since riU^-^ converges in distribution to t£) as n goes to infinity, with the 
same method as in the proof of Proposition 9, one gets the equivalences 



E[vI/(-log(C/(t^)))]xE(|log(W-i)|) ^ 
log(n) log(nf/(^p 



E 



exp —A 



expl —A 



A A 
log(?i) logtz))\ 1 



A 



A 



to 



One concludes by using (16). □ 



Corollary 12 (Q-ary protocol with blocked arrivals). When D = 2, 
G = Q and Vi^q = pi for I <i <Q, then, if all the real numbers logpj/ logpi, 
2 <i <Q, are rational, the equivalence 



E{Rn) 



F 



n 



logn 



holds, where F is the periodic function with period 1 defined by, for x>0. 



F{x) 



Q 



1^""°° exp (- A {x - } ) e-^ d?/, 



where {x} = x — [xj and A = sup{y > : Vi € {1, . . . , Q}, logpi € yZ}. 
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4. The distributions of the symmetrical Q-ary algorithm. From now on, 
it is assumed that the branching degree of the sphtting algorithm is constant, 
that is, P(G = Q) = 1, and uniform, Vi^g = l/Q for 1 < z < Q. A set of n > _D 
items is randomly, equally divided into Q subsets. Prom Proposition 11, it 
is known that 

EiRn)/n^Fi(logQn) 

as n goes to infinity, with 

(20) ^^(-) = ^rl Q-''-"^-"j^e-',y. 

This is a typical case where a regular law of large numbers does not hold. 

The purpose of this section is to strengthen the above convergence. The 
distribution of the Poisson transform of the sequence (Rn), that is, the 
random variable Rj^qo^x])^ is investigated and not only its average as before. 
In particular it is shown that, for the Poisson transform, a standard law of 
large numbers can be used to prove the oscillating behavior of the algorithm. 
In other words, these uncommon laws of large numbers can be, in the end, 
expressed in a classical probabilistic setting. 



Notation. Throughout the rest of the paper it is assumed that: 

(1) TV is a Poisson process with intensity 1 on R+. Another Poisson process 
will be used but in the two-dimensional space [0, 1] x IR+. 

(2) The variable Ai denotes a Poisson process on [0, 1] x with intensity 1; 
this is a distribution of random points on [0, 1] x M+ with the following 
properties: if Ai{H) denotes the number of points that "fall" into the 
set iJC [0,1] xM+, 

(i) For X G [0, 1] x M+, M{{x}) G {0, 1}. 

(ii) If G and H are disjoint subsets of [0, 1] x M+, the variables M.{G) 
and M{H) are independent. 

(iii) The distribution of the variable A4([a, h] x [y, z\) is Poisson with 
parameter (6 — a){z — y) for < a < 6 < 1 and < y < z. 

Note that the random variables 7V([0,x]) and A^([0,1] x [0, a;]) have a 
Poisson distribution with parameter x. 

(3) The Poisson transform of the sequence {Rn) is denoted by 7l{x), x>0, 

~ / s dist. jy dist. jj 

l<.[X) - -K^([o,x]) - ^Mi[0,l]x[0,x])- 

Its expectation is given by (8). This section is devoted to the study of the 
asymptotic behavior of the distribution of TZ{x). 
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4.1. Laws of large numbers. In this section it is proved that the Poisson 
transform of the sequence (Rn) satisfies a strong law of large numbers. A 
nice representation of this transform as a functional of Poisson processes is 
first proved in the following proposition. 

Proposition 13. The distribution of the Poisson transform lZ{x) of 
the sequence {Rn) satisfies the following relations: 

(21) 7e(x) =*■7^l(x) =^-l + Q^ Mxk/QP,x{k + l)/QP), 

where, for <a <b, (l)j\f{a, b) = 1 if Af{]a, b]) > D and otherwise, 

d- t d ^''"^ 

(22) TZ{x) =' TZ2{x) = 1 + QY ^ l{X(]fc/Qp,{fc+i)/Qp]x[o,x])>D}- 

p>0 fe=o 

Note that the function x — > TZ2{x) is clearly nondecr easing. In particular, 
if / is some nondecreasing function on R_(_, the same property holds for 

x^E[/(7^(x))]. 

Representation (21) will be useful to get a strong law of large numbers on 
subsequences and also will be used to get the full convergence in distribution 
of lZ{x)/x tends to infinity. 

Proof of Proposition 13. By the splitting property of Poisson ran- 
dom variables, the recurrence relation (2) for the sequence {Rn) can be 
expressed as 



for a; > 0. If, for < a < 6, 

(23) ^(a,6) = ^(i2^(],,,])-l), 
the last equation can be rewritten as 

(24) $(o,x)=^$i(^^x,^x) +(/.^^(0,x), 

with an obvious notation with the subscripts i for By iterating this rela- 
tion, one gets that, almost surely, the expansion 

^'"^ fk k + l 

(25) a>(0, x) = ^ 5] 0^ ( —X, 

p>0 fc=0 
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holds. The function <I>(0,x) is just the sum of the function (f)j\f on the Q-adic 
intervals of [0, x]. 

Equation (22) is proved in the same way. □ 

Representation of some of the functionals of the associated tree. When 
AA([0,x]) items are at the root of the associated tree, the total number of 
nodes of the tree R_\f([o,x]) is not the only quantity that can be represented, 
by (21) in terms of the Poisson process A/". 

The maximal depth M{x) of the associated tree when there are 7V([0,a;]) 
items at the top of the tree can be expressed as a functional of the Poisson 
process 

M{x) = max{p > 1 : 3 A;, < A; < Qf'^ - 1, A/'(]fc/QP-\ {k + 1)/QP'^]) > D}. 
The quantity 

F{x) = max{p > 1 : VA;,0 < A; < QP~^ - l,Ar{]k/QP~\ {k + 1)/^^"^]) > D} 

is the number of full levels of the tree. See [22]. Note that these quantities 
are directly related to classical occupancy problems. The number of nodes 
at level p > 1 is given by 

Q H ^W{]k/QP-K{k+l)/QP-^)>D}- 
k=0 

This is not, of course, an exhaustive list of the possible representations in 
terms of the Poisson process. 

It is quite useful to think of splitting algorithms either in terms of trees 
or in terms of Q-adic subintervals of [0,1]. In a more general case, that is, 
when the splitting algorithm is not symmetrical, a representation similar 
to (21) can be obtained by using the associated random decomposition of 
the interval [0,1] instead of the Q-adic decomposition. See [11]. 

A strong law of large numbers. Equation (24) shows that, if > 0, 
the quantity ^{0,yQ^) is the sum of the <I> on the intervals [yp,yp + y], 
0<p<Q^, and of (pj^ on the intervals [ykQ",y{k + l)Q^] contained in 
[0,yQ^], that is, 

Q^-i N Q^-"-i 

(26) f(o,yQ^)= J2 '^iyp,yp + y) + Y. E MykQ\y{k + W). 

p=0 n=l k=0 

By the independence properties of the Poisson process, the classical strong 
law of large numbers shows that, almost surely, 

QAf_l 

1™ 7W E ^{yp,yp + y) = nmy)) = T.Q''^(Mo,y/Qn) 
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= Y,Q^¥{N{%y/Q''])>D), 

by using (25), and for n > 0, 
Note that, for {)< K < N , 

N -I Q^-"-l AT -I , 



Etw E <^A^w^y(^+l)Q")<E7wQ''""<^i^-l■ 

„=i^^ fc=0 n=K^ ^ 

The three last identities and decomposition (26) give that, almost surely. 



Proposition 14 (Strong law of large numbers). With the same notation 
as in Proposition 13, for < y < Q, almost surely, 

(27) =Qy^ ^ -e-"dn 

n.z^Q-^o {D-l)\ 

= Fi{logQy), 

where Fi is the periodic function defined by (20). 

As a by-product, the proposition establishes the intuitive (and classical) fact 
that the sequence (E(i2„)/n) and the function x — > E(i2_yy-([o,x]))/^ have the 
same asymptotic behavior at infinity. Note that if G{y) is defined as the 
second term of (27), then the function x — > G{Q^) is clearly periodic with 
period 1. 

Proof of Proposition 14. Clearly, only the relation Fi(logQy) = 
G{y) has to be proved. For n G Z, if to is the Dth point of the Poisson 
process AA, then 

FmO,yQ''])>D) = F{tD<yQn= Hu<yQr^}^^—^e~-du. 
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By summing up these terms, with Fubini's theorem one gets 



f + CO 1 

G{y)=Q J2-F^Mu 



e du 



Q-lJo yQ^logQW?/)! (D-i)! 
Q2 r+oo 1 ^D~2 



Q-lJo Q-{i°gQ Wy)} (D - 1)! 
= ^i(logQ y). 
The proposition is proved. □ 

The following proposition establishes a weak law of large numbers for the 
Poisson transform of the sequence {Rn)- Devroye [8] obtained related results 
in a more general framework by using Talagrand's concentration inequalities. 



Theorem 15 (Law of large numbers). The following convergence in 
distribution holds, for any e > 0; 

7^(x) 



lim 



xFi{logQx) 
where Fi is the function defined by (20). 



1 



>e 



0, 



Proof. For x > 0, one defines = [logg x\, Ux = x/Q^"^ and, for p > 
1, Zx = [uxpIQ^"" /p- Note that sup3;>;^ \x/zx — 1| converges to as p tends 
to infinity, hence by continuity of Fi, 

xFi{logQx) 



lim sup 



1 



0. 



ZxFl{logQZx) 

Proposition 14 shows that for p > 1, almost surely, for k, <k <p, 

N^+^ykQ^Fi{\ogQyk) 
with Hk = k/p. Therefore, if p > 1 is fixed, almost surely, 

lim ^l(££L_ = i. 

X^ + OO ZxFl(\0gQ Zx) 

The monotonicity of the function x — > lZ2{x) gives the relation 



7^, 



P 
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One finally gets 

\xFi{\ogQx) 



7^l(z^) Za,Fi{\ogQZ.^) ^ 

z^Fi (logQ Zx) xFi (logQ x) 



lim P( 5i^L_<i_e) =0. 



therefore, 



x^+oo \xFi(logQx) 

The analogous inequality is obtained in the same way. The theorem is proved. 
□ 

4.2. Central limit theorems. For > 1 and < a; < Q, with $ defined 
by (23), the variance of the variable <I>(0,x) is first analyzed. The expan- 
sion (25) gives 

QP — l Qp' ~i 

[m^)-nmx)f = i: E E E ^kA^)^k',p'ix), 

p>0 k=0 p'>0 k'=0 

with 

Afc,p(x) = (j)^ (^-^x, ^^^^ - E (^0Ar (o, -^x 

The expected value of the variable Afc^p(x)Afc/y (x) is nonzero only p<p' 
and kQP'~'P <k'< {k + l)Q^'~^ — 1 or the symmetrical condition by exchang- 
ing {p,k) and {p',k'): 

E[($(0,3;) -E($(0,x)))^] 
= E E E[A,,,(x)2] 

p>0 k=0 

QP-1 {k+l)Qp'-P-l 

+2E E E E mkA^)^k',p'ix)]. 

p>0 k=0 p'>p f,/^f,Qp'-p 

By using the elementary identities 

E[Afc,p(x)2] = E[0^(O, x/QP)]il - nMO, x/QP)]) 
= HiD<x/QP)¥{sD>x/QP), 
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E[Afc,p(x)Afc,y(x)] =E[</>^(0,a;/Q?^')](l -E[,^^(0,x/Q^')]) 

= P(iz)<x/QP>(sD>x/gP), 

where t d and s o are independent random variables with the same distribu- 
tion as the Dth point of the Poisson process A/", one gets the relation 

E[($(0, x) - E($(0, x))f] = Q^ntD < x/QP < sd) 

+ 2 Q'''ntD<x/QP',x/QP<SD). 

p'>p>0 



By switching again the series and the expected values, one finally obtains 
(g-l)E[(cI>(0,x)-E($(0,x))f] 

(^Jli^CV ^ (4 M{ [logQ (x/tB)J> [logQ (x/sb)J}; 

'E(( 

Q 



(28) 

+ 2QE((LlogQ(x/tz,)J - LlogQ(^/sD)J - l)+QLiogp(.Az,)J; 



Q-1 



E((QLiogQ(^Ai?)J _ QLiogQ{a;/sr.)J+i^+-)^ 



where a"*" = max(a, 0) for a S IR. This identity gives the following proposition. 
A similar proposition has been proved by Jacquet and Regnier [17] and 
Regnier and Jacquet [37] in the case where Q = D = 2 but without symmetry 
conditions as is the case here. See also [28], Chapter 5. 

Proposition 16 (Asymptotic variance). The variance of the Poisson 
transform of the sequence (Rn) satisfies the following equivalence, as x goes 
to infinity: 

^Var(7^(x))~F2(logQ(x)), 
where F2 is the continuous periodic function with period 1 defined by, for 

y>o, 



F2{y)= I /2({y-logQ(u)},{y-logQ(w)},u,7;) 

(29) 

„.D-1 „,D-1 



{D-l)\{D-l)\ 
with {z} = z — [zj for 2: G M and for u > 0, w > and y € M, 

f2ia,b,U,v) = -QZ:iy— -jMlogQ{v/u)+b>a} 



PROBABILISTIC ANALYSIS OF TREE ALGORITHMS 25 



+ -^i^ogQiv/u)-a + b-l) 
2Q fQ" Q^+^ 



iQ-iy\ u 



where z'^ = max(z, 0) . 



Note that a more detailed expansion of the variance could be obtained 
with (28). 

Proposition 17 (Central limit theorem for Poisson transform). For 

< y < Q , as N tends to infinity, the variable 



converges in distribution to a Gaussian centered random variable with vari- 
ance yF2(logQy), where F2 is defined by (29). 

Proof. It is enough to prove the proposition for the variable $(0,x) 
defined by 

^(0,x) = ^(i2^(]o,x])-l). 
Equation (26) gives, for K >l, 

myQ'')-nmyQ'')) 

= [^{yp,yp + y)-nH^.y))] 

p=0 

K Q^-"-l 

+ E E [(t>N{ykQ^Mk + i)Q^)-n(t>N{^,yQ^))] + ^K, 

n=l k=0 

where Ak is the residual term of the series. By using the method to compute 
the variance, it is not difficult to establish that, for any e > there exists 
some K > such that the expected value of (A^'/Q^)^ is less than e, for 
A'^ sufficiently large. 

By regrouping the terms of the above equation according to the Q-adic 
intervals [yk/Q^',y{k + l)/Q^] for < /c < Q^'^ and by using the inde- 
pendence properties of the Poisson process J\f, the quantity ^{0,yQ^) — 
K{^{0,yQ^)) — Ak can be written as a sum of Q^~^ independent iden- 
tically distributed random variables. Therefore, the classical central limit 
theorem can be applied. The proposition is proved. □ 
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4.3. The distribution of the sequence (i?„). The following proposition 
describes the distribution of the variable Rn in terms of n i.i.d. uniformly 
distributed random variables on the interval [0,1]. This characterization is 
generally implicitly used to get various asymptotics describing the depth of 
the associated tree. See [28] and [35]. 

Proposition 18. For n > 0, the random variable Rn has the same dis- 
tribution as 

QP — l 

(30) Rn''=l + QY1 E 4>uA]k/Q'',{k + l)m), 

p>0 k=0 

where, for < a < 6 < 1, </'w„(]a, 6]) = 1 if Un{]a,b]) > D and otherwise. 
The variable Un is the point measure on [0, 1] defined by 

l^n = ^Ui + ^ ^Un ) 

{Ui, . . . ,Un) are i.i.d. random variables uniformly distributed on [0,1], in 
particular, Un{]a,b]) is the number of Ui s in the interval ]a,b]. 

Proof. Assume that TV is a Poisson process with parameter 1; by def- 
inition 

(^Ar{]o,x])|AA(]0,x]) = n) Rn. 

Due to Proposition 13, the distribution of the Poisson transform Rj^fQo^x]) 
is expressed as a functional of the points of the Poisson process on the 
interval [0,x]. But, as in the proof of Proposition 6, conditionally on the 
event {AA(]0, x]) =n}, these points can be expressed as xUi, I <i <n, where 
(Ui) are i.i.d. uniformly distributed random variables on the interval [0,1]. 
Equation (30) is thus a direct consequence of (21). □ 
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